A Primer on Generative Adversarial Networks by Sanaa Kaddoura

A Primer on Generative Adversarial Networks by Sanaa Kaddoura

Author:Sanaa Kaddoura
Language: eng
Format: epub
ISBN: 9783031326615
Publisher: Springer International Publishing


Data Collection and Preparation

Various GAN models can be used in deep fake video generation, such as speech-to-video or video-to-video GANs. Speech-to-video deep fake GANs mean generating videos of talking faces based on audio files and images of the target. On the other hand, video-to-video GANs involve generating counterfeit videos for a target individual with source and target person as a requirement. Thus, GAN will swap faces and voices in a video. In this section, video-to-video GANs will be explained.

To create a deep fake video, a dataset of videos of a real person is fed into the GANs. One way is to collect a large and diverse dataset of real videos, which can be the basis for generating fake videos. The dataset should have sufficient variability regarding different viewpoints, lighting conditions, backgrounds, and other relevant factors. Additionally, the dataset should be annotated to facilitate training and evaluation of the model. The dataset must contain different videos of the source and target speaker. The vocals and image of target speaker B should replace the vocals and image of source speaker A. Another way is to use public datasets. Various video datasets are available for this purpose, such as the FaceForensics++ dataset [5], which contains multiple individuals’ real and deep fake videos. The VoxCeleb dataset [6] is another viable option. It includes over 1,000 hours of audio and video recordings of individuals, making it suitable for training deep fakes that involve audio and video. There are various datasets online; however, VoxCeleb serves the problem. It consists of short clips of human speech extracted from interview videos uploaded to YouTube.

Once the dataset is collected, it must be preprocessed and formatted for training the GAN model. This includes resizing the videos to a consistent resolution, normalizing the pixel values, cropping the videos to remove unwanted parts of the frame, such as black borders, and splitting the videos into individual frames. Splitting the videos into individual frames is essential since working with entire videos can be computationally intensive and time-consuming. The frames can be further augmented to increase the variability in the training data by applying random rotations, zooms, and flips.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.